12/16/2019

Yelp Restaurant Rating and Review Analysis

- Focus on Japanese restaurant in three main cities - New York, Dallas and San Fransisco.
- Data from Yelp API.

Introduction

This project aimed to explore Japanese restaurants’ ratings and reviews in there most popular cities in US. Which are New York, San Fransisco and Dallas.

  • First we will explor the rating distributions of restaurant in different cities.

  • Then we will dive in their geographical location distritutions to see the clustering on a geographical scale. (Pls refer to Shiny App)

  • Finally we will explore on the review data to find the frequency of popular words in the reviews of high-reviewing restaurants and also sentiments words clouds to see their positive and negative words distributions.

Data Preparation

  • Extrac Data via Yelp API
  • Restaurant/Business Data
  • Reveiw Data

key <- textreadr::read_rtf(“API.rtf”)

Location: New York City, San Francisco, Dallas

Categories : Restaurants

Term: Japanese

** Data reading please refer read_data.R file

EDA

-Summary of the Restaurant Data and Average Review Ratings by City.

Japanese Food Restaurant
city total_business total_reviews avg_rating
NYC 999 331790 4.02
Dallas 569 127633 3.84
SF 922 493271 3.78

EDA

Ratings Score Distribution

EDA

Ratings Score Distribution By City

EDA

Ratings Score Distribution By Price Level

EDA

We can see that most of the ratings are concenrated on 4 scores, Average Japanese restaurant ratings in New York is higher then the other two cities.

Mapping

  • smaller Blue–Rating <= 3.5
  • Orange–Rating = 4
  • Red–Rating >= 4.5

Text Mining

Frequent words in New York Japanese Restaurant

Text Mining

Frequent words in San Fransisco Japanese Restaurant

Text Mining

Frequent words in Dallas Japanese Restaurant

Text Mining

Text Mining

  • a From Below two slides we can see that NYC and Dallas have more correlation in restaurant reciews in high ratings restaurant than with San Fransisco
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and Dallas
## t = 49.969, df = 674, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8702022 0.9024031
## sample estimates:
##     cor 
## 0.88738

Text Mining

  • b
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and Dallas
## t = 70.971, df = 872, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9128449 0.9324908
## sample estimates:
##       cor 
## 0.9232693